Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 95
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38648123

RESUMO

Vulnerability to adversarial attacks is one of the principal hurdles to the adoption of deep learning in safety-critical applications. Despite significant efforts, both practical and theoretical, training deep learning models robust to adversarial attacks is still an open problem. In this article, we analyse the geometry of adversarial attacks in the over-parameterized limit for Bayesian neural networks (BNNs). We show that, in the limit, vulnerability to gradient-based attacks arises as a result of degeneracy in the data distribution, i.e., when the data lie on a lower dimensional submanifold of the ambient space. As a direct consequence, we demonstrate that in this limit, BNN posteriors are robust to gradient-based adversarial attacks. Crucially, by relying on the convergence of infinitely-wide BNNs to Gaussian processes (GPs), we prove that, under certain relatively mild assumptions, the expected gradient of the loss with respect to the BNN posterior distribution is vanishing, even when each NN sampled from the BNN posterior does not have vanishing gradients. The experimental results on the MNIST, Fashion MNIST, and a synthetic dataset with BNNs trained with Hamiltonian Monte Carlo and variational inference support this line of arguments, empirically showing that BNNs can display both high accuracy on clean data and robustness to both gradient-based and gradient-free adversarial attacks.

2.
Genome Biol ; 25(1): 55, 2024 Feb 23.
Artigo em Inglês | MEDLINE | ID: mdl-38395871

RESUMO

Multi-omic single-cell technologies, which simultaneously measure the transcriptional and epigenomic state of the same cell, enable understanding epigenetic mechanisms of gene regulation. However, noisy and sparse data pose fundamental statistical challenges to extract biological knowledge from complex datasets. SHARE-Topic, a Bayesian generative model of multi-omic single cell data using topic models, aims to address these challenges. SHARE-Topic identifies common patterns of co-variation between different omic layers, providing interpretable explanations for the data complexity. Tested on data from different technological platforms, SHARE-Topic provides low dimensional representations recapitulating known biology and defines associations between genes and distal regulators in individual cells.


Assuntos
Epigenômica , Multiômica , Teorema de Bayes , Epigênese Genética
3.
Biophys J ; 123(2): 184-194, 2024 Jan 16.
Artigo em Inglês | MEDLINE | ID: mdl-38087781

RESUMO

Cellular functions crucially depend on the precise execution of complex biochemical reactions taking place on the chromatin fiber in the tightly packed environment of the cell nucleus. Despite the availability of large datasets probing this process from multiple angles, bottom-up frameworks that allow the incorporation of the sequence-specific nature of biochemistry in a unified model of 3D chromatin structure remain scarce. Here, we propose Sequence-Enhanced Magnetic Polymer (SEMPER), a novel stochastic polymer model that naturally incorporates observational data about sequence-driven biochemical processes, such as binding of transcription factor proteins, in a 3D model of chromatin structure. We introduce a novel approximate Bayesian algorithm to quantify a posteriori the relative importance of various factors, including the polymeric nature of DNA, in determining chromatin epigenetic state, thus providing a transparent way to generate biological hypotheses. Although accurate prediction of contact frequencies (a problem already extensively studied in the literature) is not our main aim, as a by-product of the inference procedure and without additional input from the genome 3D structure, our model can predict with reasonable accuracy some notable and nontrivial conformational features of chromatin folding within the nucleus. Our work highlights the importance of introducing physically realistic statistical models for predicting chromatin states from epigenetic data and opens the way to a new class of more systematic approaches to interpreting epigenomic data.


Assuntos
Cromatina , Polímeros , Teorema de Bayes , Cromossomos , Conformação Molecular
4.
J Chem Phys ; 158(11): 114113, 2023 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-36948813

RESUMO

The complexity of mathematical models in biology has rendered model reduction an essential tool in the quantitative biologist's toolkit. For stochastic reaction networks described using the Chemical Master Equation, commonly used methods include time-scale separation, Linear Mapping Approximation, and state-space lumping. Despite the success of these techniques, they appear to be rather disparate, and at present, no general-purpose approach to model reduction for stochastic reaction networks is known. In this paper, we show that most common model reduction approaches for the Chemical Master Equation can be seen as minimizing a well-known information-theoretic quantity between the full model and its reduction, the Kullback-Leibler divergence defined on the space of trajectories. This allows us to recast the task of model reduction as a variational problem that can be tackled using standard numerical optimization approaches. In addition, we derive general expressions for propensities of a reduced system that generalize those found using classical methods. We show that the Kullback-Leibler divergence is a useful metric to assess model discrepancy and to compare different model reduction techniques using three examples from the literature: an autoregulatory feedback loop, the Michaelis-Menten enzyme system, and a genetic oscillator.

5.
RNA ; 28(11): 1469-1480, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36008134

RESUMO

RNA-binding proteins (RBPs) are key co- and post-transcriptional regulators of gene expression, playing a crucial role in many biological processes. Experimental methods like CLIP-seq have enabled the identification of transcriptome-wide RNA-protein interactions for select proteins; however, the time- and resource-intensive nature of these technologies call for the development of computational methods to complement their predictions. Here, we leverage recent, large-scale CLIP-seq experiments to construct a de novo predictor of RNA-protein interactions based on graph neural networks (GNN). We show that the GNN method allows us not only to predict missing links in an RNA-protein network, but to predict the entire complement of targets of previously unassayed proteins, and even to reconstruct the entire network of RNA-protein interactions in different conditions based on minimal information. Our results demonstrate the potential of modern machine learning methods to extract useful information on post-transcriptional regulation from large data sets.


Assuntos
Redes Neurais de Computação , RNA , Análise de Sequência de RNA/métodos , RNA/genética , RNA/metabolismo , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Aprendizado de Máquina
6.
J R Soc Interface ; 19(192): 20220153, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35858045

RESUMO

Estimating uncertainty in model predictions is a central task in quantitative biology. Biological models at the single-cell level are intrinsically stochastic and nonlinear, creating formidable challenges for their statistical estimation which inevitably has to rely on approximations that trade accuracy for tractability. Despite intensive interest, a sweet spot in this trade-off has not been found yet. We propose a flexible procedure for uncertainty quantification in a wide class of reaction networks describing stochastic gene expression including those with feedback. The method is based on creating a tractable coarse-graining of the model that is learned from simulations, a synthetic model, to approximate the likelihood function. We demonstrate that synthetic models can substantially outperform state-of-the-art approaches on a number of non-trivial systems and datasets, yielding an accurate and computationally viable solution to uncertainty quantification in stochastic models of gene expression.


Assuntos
Algoritmos , Modelos Biológicos , Expressão Gênica , Processos Estocásticos , Incerteza
7.
PLoS Comput Biol ; 18(6): e1010163, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35727848

RESUMO

Single-cell multi-omics assays offer unprecedented opportunities to explore epigenetic regulation at cellular level. However, high levels of technical noise and data sparsity frequently lead to a lack of statistical power in correlative analyses, identifying very few, if any, significant associations between different molecular layers. Here we propose SCRaPL, a novel computational tool that increases power by carefully modelling noise in the experimental systems. We show on real and simulated multi-omics single-cell data sets that SCRaPL achieves higher sensitivity and better robustness in identifying correlations, while maintaining a similar level of false positives as standard analyses based on Pearson and Spearman correlation.


Assuntos
Epigênese Genética , Teorema de Bayes
8.
Stat Appl Genet Mol Biol ; 21(1)2022 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-35073469

RESUMO

RNA-protein interactions have long being recognised as crucial regulators of gene expression. Recently, the development of scalable experimental techniques to measure these interactions has revolutionised the field, leading to the production of large-scale datasets which offer both opportunities and challenges for machine learning techniques. In this brief note, we will discuss some of the major stumbling blocks towards the use of machine learning in computational RNA biology, focusing specifically on the problem of predicting RNA-protein interactions from next-generation sequencing data.


Assuntos
Biologia Computacional , Aprendizado de Máquina , Biologia Computacional/métodos , RNA/genética
9.
Genome Biol ; 22(1): 251, 2021 08 27.
Artigo em Inglês | MEDLINE | ID: mdl-34452629

RESUMO

RNA splicing is an important driver of heterogeneity in single cells through the expression of alternative transcripts and as a determinant of transcriptional kinetics. However, the intrinsic coverage limitations of scRNA-seq technologies make it challenging to associate specific splicing events to cell-level phenotypes. BRIE2 is a scalable computational method that resolves these issues by regressing single-cell transcriptomic data against cell-level features. We show that BRIE2 effectively identifies differential disease-associated alternative splicing events and allows a principled selection of genes that capture heterogeneity in transcriptional kinetics and improve RNA velocity analyses, enabling the identification of splicing phenotypes associated with biological changes.


Assuntos
Splicing de RNA/genética , Análise de Célula Única , Transcriptoma/genética , Teorema de Bayes , Simulação por Computador , Humanos , Esclerose Múltipla/genética , Fenótipo , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , RNA/metabolismo
10.
Genome Biol ; 22(1): 165, 2021 05 27.
Artigo em Inglês | MEDLINE | ID: mdl-34044851

RESUMO

Advancing RNA structural probing techniques with next-generation sequencing has generated demands for complementary computational tools to robustly extract RNA structural information amidst sampling noise and variability. We present diffBUM-HMM, a noise-aware model that enables accurate detection of RNA flexibility and conformational changes from high-throughput RNA structure-probing data. diffBUM-HMM is widely compatible, accounting for sampling variation and sequence coverage biases, and displays higher sensitivity than existing methods while robust against false positives. Our analyses of datasets generated with a variety of RNA probing chemistries demonstrate the value of diffBUM-HMM for quantitatively detecting RNA structural changes and RNA-binding protein binding sites.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Cadeias de Markov , Modelos Estatísticos , RNA/química , RNA/genética , Sequência de Bases , Sítios de Ligação , Bases de Dados Genéticas , Modelos Teóricos , Mutação/genética , Nucleotídeos/genética , Ligação Proteica , Precursores de RNA/genética , RNA Longo não Codificante/genética , Ribossomos/metabolismo
11.
Genome Biol ; 22(1): 114, 2021 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-33879195

RESUMO

High-throughput single-cell measurements of DNA methylomes can quantify methylation heterogeneity and uncover its role in gene regulation. However, technical limitations and sparse coverage can preclude this task. scMET is a hierarchical Bayesian model which overcomes sparsity, sharing information across cells and genomic features to robustly quantify genuine biological heterogeneity. scMET can identify highly variable features that drive epigenetic heterogeneity, and perform differential methylation and variability analyses. We illustrate how scMET facilitates the characterization of epigenetically distinct cell populations and how it enables the formulation of novel hypotheses on the epigenetic regulation of gene expression. scMET is available at https://github.com/andreaskapou/scMET .


Assuntos
Teorema de Bayes , Metilação de DNA , Epigênese Genética , Epigenômica/métodos , Heterogeneidade Genética , Análise de Célula Única/métodos , Software , Algoritmos , Biologia Computacional/métodos
12.
BMC Bioinformatics ; 21(1): 531, 2020 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-33203356

RESUMO

BACKGROUND: The large-scale availability of whole-genome sequencing profiles from bulk DNA sequencing of cancer tissues is fueling the application of evolutionary theory to cancer. From a bulk biopsy, subclonal deconvolution methods are used to determine the composition of cancer subpopulations in the biopsy sample, a fundamental step to determine clonal expansions and their evolutionary trajectories. RESULTS: In a recent work we have developed a new model-based approach to carry out subclonal deconvolution from the site frequency spectrum of somatic mutations. This new method integrates, for the first time, an explicit model for neutral evolutionary forces that participate in clonal expansions; in that work we have also shown that our method improves largely over competing data-driven methods. In this Software paper we present mobster, an open source R package built around our new deconvolution approach, which provides several functions to plot data and fit models, assess their confidence and compute further evolutionary analyses that relate to subclonal deconvolution. CONCLUSIONS: We present the mobster package for tumour subclonal deconvolution from bulk sequencing, the first approach to integrate Machine Learning and Population Genetics which can explicitly model co-existing neutral and positive selection in cancer. We showcase the analysis of two datasets, one simulated and one from a breast cancer patient, and overview all package functionalities.


Assuntos
Neoplasias da Mama/genética , DNA de Neoplasias/genética , Software , Sequenciamento Completo do Genoma , Proliferação de Células , Células Clonais , Análise de Dados , Feminino , Genética Populacional , Humanos , Aprendizado de Máquina , Modelos Genéticos , Mutação/genética
13.
PLoS Genet ; 16(10): e1009087, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33048927

RESUMO

MeCP2 is an abundant protein in mature nerve cells, where it binds to DNA sequences containing methylated cytosine. Mutations in the MECP2 gene cause the severe neurological disorder Rett syndrome (RTT), provoking intensive study of the underlying molecular mechanisms. Multiple functions have been proposed, one of which involves a regulatory role in splicing. Here we leverage the recent availability of high-quality transcriptomic data sets to probe quantitatively the potential influence of MeCP2 on alternative splicing. Using a variety of machine learning approaches that can capture both linear and non-linear associations, we show that widely different levels of MeCP2 have a minimal effect on alternative splicing in three different systems. Alternative splicing was also apparently indifferent to developmental changes in DNA methylation levels. Our results suggest that regulation of splicing is not a major function of MeCP2. They also highlight the importance of multi-variate quantitative analyses in the formulation of biological hypotheses.


Assuntos
Processamento Alternativo/genética , Proteína 2 de Ligação a Metil-CpG/genética , Síndrome de Rett/genética , Transcriptoma/genética , Animais , Encéfalo/metabolismo , Citosina/metabolismo , DNA (Citosina-5-)-Metiltransferase 1/genética , DNA (Citosina-5-)-Metiltransferases/genética , Metilação de DNA/genética , DNA Metiltransferase 3A , Modelos Animais de Doenças , Humanos , Metilação , Camundongos , Camundongos Knockout , Mutação/genética , Neurônios/metabolismo , Neurônios/patologia , Ligação Proteica/genética , Síndrome de Rett/metabolismo , Síndrome de Rett/patologia
14.
Nat Genet ; 52(9): 898-907, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32879509

RESUMO

Most cancer genomic data are generated from bulk samples composed of mixtures of cancer subpopulations, as well as normal cells. Subclonal reconstruction methods based on machine learning aim to separate those subpopulations in a sample and infer their evolutionary history. However, current approaches are entirely data driven and agnostic to evolutionary theory. We demonstrate that systematic errors occur in the analysis if evolution is not accounted for, and this is exacerbated with multi-sampling of the same tumor. We present a novel approach for model-based tumor subclonal reconstruction, called MOBSTER, which combines machine learning with theoretical population genetics. Using public whole-genome sequencing data from 2,606 samples from different cohorts, new data and synthetic validation, we show that this method is more robust and accurate than current techniques in single-sample, multiregion and longitudinal data. This approach minimizes the confounding factors of nonevolutionary methods, thus leading to more accurate recovery of the evolutionary history of human cancers.


Assuntos
Neoplasias/genética , Evolução Clonal/genética , Genética Populacional/métodos , Genômica/métodos , Humanos , Aprendizado de Máquina , Sequenciamento Completo do Genoma/métodos
15.
Nature ; 576(7787): 487-491, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31827285

RESUMO

Formation of the three primary germ layers during gastrulation is an essential step in the establishment of the vertebrate body plan and is associated with major transcriptional changes1-5. Global epigenetic reprogramming accompanies these changes6-8, but the role of the epigenome in regulating early cell-fate choice remains unresolved, and the coordination between different molecular layers is unclear. Here we describe a single-cell multi-omics map of chromatin accessibility, DNA methylation and RNA expression during the onset of gastrulation in mouse embryos. The initial exit from pluripotency coincides with the establishment of a global repressive epigenetic landscape, followed by the emergence of lineage-specific epigenetic patterns during gastrulation. Notably, cells committed to mesoderm and endoderm undergo widespread coordinated epigenetic rearrangements at enhancer marks, driven by ten-eleven translocation (TET)-mediated demethylation and a concomitant increase of accessibility. By contrast, the methylation and accessibility landscape of ectodermal cells is already established in the early epiblast. Hence, regulatory elements associated with each germ layer are either epigenetically primed or remodelled before cell-fate decisions, providing the molecular framework for a hierarchical emergence of the primary germ layers.


Assuntos
Metilação de DNA , Epigênese Genética , Gástrula/citologia , Gástrula/metabolismo , Gastrulação/genética , Regulação da Expressão Gênica no Desenvolvimento , RNA/genética , Análise de Célula Única , Animais , Diferenciação Celular/genética , Linhagem da Célula/genética , Cromatina/genética , Cromatina/metabolismo , Desmetilação , Corpos Embrioides/citologia , Endoderma/citologia , Endoderma/embriologia , Endoderma/metabolismo , Elementos Facilitadores Genéticos/genética , Epigenoma/genética , Eritropoese , Análise Fatorial , Gástrula/embriologia , Gastrulação/fisiologia , Mesoderma/citologia , Mesoderma/embriologia , Mesoderma/metabolismo , Camundongos , Células-Tronco Pluripotentes/citologia , Células-Tronco Pluripotentes/metabolismo , RNA/análise , Fatores de Tempo , Dedos de Zinco
16.
PLoS Comput Biol ; 15(11): e1007442, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31682604

RESUMO

Large-scale neural recording methods now allow us to observe large populations of identified single neurons simultaneously, opening a window into neural population dynamics in living organisms. However, distilling such large-scale recordings to build theories of emergent collective dynamics remains a fundamental statistical challenge. The neural field models of Wilson, Cowan, and colleagues remain the mainstay of mathematical population modeling owing to their interpretable, mechanistic parameters and amenability to mathematical analysis. Inspired by recent advances in biochemical modeling, we develop a method based on moment closure to interpret neural field models as latent state-space point-process models, making them amenable to statistical inference. With this approach we can infer the intrinsic states of neurons, such as active and refractory, solely from spiking activity in large populations. After validating this approach with synthetic data, we apply it to high-density recordings of spiking activity in the developing mouse retina. This confirms the essential role of a long lasting refractory state in shaping spatiotemporal properties of neonatal retinal waves. This conceptual and methodological advance opens up new theoretical connections between mathematical theory and point-process state-space models in neural data analysis.


Assuntos
Biologia Computacional/métodos , Neuroimagem/métodos , Potenciais de Ação/fisiologia , Algoritmos , Animais , Teorema de Bayes , Mapeamento Encefálico/métodos , Interpretação Estatística de Dados , Humanos , Modelos Neurológicos , Modelos Teóricos , Rede Nervosa/fisiologia , Neurônios/fisiologia
17.
Proc Math Phys Eng Sci ; 475(2229): 20190100, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-31611711

RESUMO

Fluid approximations have seen great success in approximating the macro-scale behaviour of Markov systems with a large number of discrete states. However, these methods rely on the continuous-time Markov chain (CTMC) having a particular population structure which suggests a natural continuous state-space endowed with a dynamics for the approximating process. We construct here a general method based on spectral analysis of the transition matrix of the CTMC, without the need for a population structure. Specifically, we use the popular manifold learning method of diffusion maps to analyse the transition matrix as the operator of a hidden continuous process. An embedding of states in a continuous space is recovered, and the space is endowed with a drift vector field inferred via Gaussian process regression. In this manner, we construct an ordinary differential equation whose solution approximates the evolution of the CTMC mean, mapped onto the continuous space (known as the fluid limit).

18.
Genome Biol ; 20(1): 61, 2019 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-30898142

RESUMO

Measurements of single-cell methylation are revolutionizing our understanding of epigenetic control of gene expression, yet the intrinsic data sparsity limits the scope for quantitative analysis of such data. Here, we introduce Melissa (MEthyLation Inference for Single cell Analysis), a Bayesian hierarchical method to cluster cells based on local methylation patterns, discovering patterns of epigenetic variability between cells. The clustering also acts as an effective regularization for data imputation on unassayed CpG sites, enabling transfer of information between individual cells. We show both on simulated and real data sets that Melissa provides accurate and biologically meaningful clusterings and state-of-the-art imputation performance.


Assuntos
Teorema de Bayes , Biologia Computacional/métodos , Metilação de DNA , Células-Tronco Embrionárias/metabolismo , Modelos Estatísticos , Análise de Célula Única/métodos , Algoritmos , Simulação por Computador , Ilhas de CpG , Células-Tronco Embrionárias/citologia , Humanos , Modelos Genéticos
19.
Methods Mol Biol ; 1935: 175-185, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30758827

RESUMO

Single-cell RNA-seq (scRNA-seq) provides a comprehensive measurement of stochasticity in transcription, but the limitations of the technology have prevented its application to dissect variability in RNA processing events such as splicing. In this chapter, we review the challenges in splicing isoform quantification in scRNA-seq data and discuss BRIE (Bayesian regression for isoform estimation), a recently proposed Bayesian hierarchical model which resolves these problems by learning an informative prior distribution from sequence features. We illustrate the usage of BRIE with a case study on 130 mouse cells during gastrulation.


Assuntos
Isoformas de Proteínas/genética , Splicing de RNA/genética , RNA Citoplasmático Pequeno/genética , Análise de Sequência de RNA/métodos , Animais , Teorema de Bayes , Expressão Gênica/genética , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Camundongos , RNA/genética , Análise de Célula Única/métodos , Software
20.
Methods Mol Biol ; 1883: 1-23, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30547394

RESUMO

Gene regulatory networks are powerful abstractions of biological systems. Since the advent of high-throughput measurement technologies in biology in the late 1990s, reconstructing the structure of such networks has been a central computational problem in systems biology. While the problem is certainly not solved in its entirety, considerable progress has been made in the last two decades, with mature tools now available. This chapter aims to provide an introduction to the basic concepts underpinning network inference tools, attempting a categorization which highlights commonalities and relative strengths. While the chapter is meant to be self-contained, the material presented should provide a useful background to the later, more specialized chapters of this book.


Assuntos
Biologia Computacional/métodos , Ciência de Dados/métodos , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Modelos Genéticos , Algoritmos , Biologia Computacional/instrumentação , Ciência de Dados/instrumentação , Perfilação da Expressão Gênica/instrumentação , Perfilação da Expressão Gênica/métodos , Ensaios de Triagem em Larga Escala/instrumentação , Ensaios de Triagem em Larga Escala/métodos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...